Applying the Multiple Cause Mixture Model to Text Categorization
نویسندگان
چکیده
This paper introduces the use of the Multiple Cause Mixture Model to automatic text cat egory assignment Although much research has been done on text categorization this al gorithm is novel in that is unsupervised that is does not require pre labeled training ex amples and it can assign multiple category labels to documents In this paper we present very preliminary results of the application of this model to a standard test collection eval uating it in supervised mode in order to fa cilitate comparison with other methods and showing initial results of its use in unsuper vised mode
منابع مشابه
Parametric Mixture Models for Multi-Labeled Text
We propose probabilistic generative models, called parametric mixture models (PMMs), for multiclass, multi-labeled text categorization problem. Conventionally, the binary classification approach has been employed, in which whether or not text belongs to a category is judged by the binary classifier for every category. In contrast, our approach can simultaneously detect multiple categories of te...
متن کاملInstance Label Prediction by Dirichlet Process Multiple Instance Learning
We propose a generative Bayesian model that predicts instance labels from weak (bag-level) supervision. We solve this problem by simultaneously modeling class distributions by Gaussian mixture models and inferring the class labels of positive bag instances that satisfy the multiple instance constraints. We employ Dirichlet process priors on mixture weights to automate model selection, and effic...
متن کاملDevelopment of a Multi-Classifier Approach for Multilingual Text Categorization
Research work related to applying text categorization methods to a monolingual corpus such as English text collections has been well established by several research teams in recent years. However, little attention has been paid to applying the techniques to classify the documents in multiple languages such as English and Chinese by means of a unified model. In this paper we propose a multi-clas...
متن کاملLarge margin multinomial mixture model for text categorization
In this paper, we present a novel discriminative training method for multinomial mixture models (MMM) in text categorization based on the principle of large margin. Under some approximation and relaxation conditions, large margin estimation (LME) of MMMs can be formulated as linear programming (LP) problems, which can be efficiently and reliably solved by many general optimization tools even fo...
متن کاملClassifying Business Types from Twitter Posts Using Active Learning
Today, many companies have adopted Twitter as an additional marketing medium to advertise and promote their business activities. One possible solution for organizing a large number of posts is to classify them into a predefined category of business types. Applying normal text categorization technique on Twitter is ineffective due to the short-length (140-character limit) characteristic of each ...
متن کامل